{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfGpInivS0fG"
   },
   "source": [
    "<h2 align=\"center\">点击下列图标在线运行HanLP</h2>\n",
    "<div align=\"center\">\n",
    "\t<a href=\"https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/keyphrase_restful.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\t<a href=\"https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fkeyphrase_restful.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" alt=\"Open In Binder\"/></a>\n",
    "</div>\n",
    "\n",
    "## 安装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IYwV-UkNNzFp"
   },
   "source": [
    "无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Uf_u7ddMhUt"
   },
   "outputs": [],
   "source": [
    "!pip install hanlp_restful -U"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pp-1KqEOOJ4t"
   },
   "source": [
    "## 创建客户端"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4M7ka0K5OMWU",
    "outputId": "d74f0749-0587-454a-d7c9-7418d45ce534"
   },
   "outputs": [],
   "source": [
    "from hanlp_restful import HanLPClient\n",
    "HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh') # auth不填则匿名，zh中文，mul多语种"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BMW528wGNulM"
   },
   "source": [
    "#### 申请秘钥\n",
    "由于服务器算力有限，匿名用户每分钟限2次调用。如果你需要更多调用次数，[建议申请免费公益API秘钥auth](https://bbs.hanlp.com/t/hanlp2-1-restful-api/53)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "elA_UyssOut_"
   },
   "source": [
    "## 关键词提取\n",
    "关键词（短语）提取的目标是文本中最具有代表性的关键词以及短语。\n",
    "### 中文\n",
    "关键词提取任务的输入为一段文本和所需的关键词数量`topk`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "BqEmDMGGOtk3",
    "outputId": "936d439a-e1ff-4308-d2aa-775955558594"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'自然语言处理': 0.800000011920929,\n",
       " 'HanLP的全部性能': 0.5256577134132385,\n",
       " '一门博大精深的学科': 0.42154020071029663}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "HanLP.keyphrase_extraction('自然语言处理是一门博大精深的学科，掌握理论才能发挥出HanLP的全部性能。 '\n",
    "                           '《自然语言处理入门》是一本配套HanLP的NLP入门书，助你零起点上手自然语言处理。', topk=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jj1Jk-2sPHYx"
   },
   "source": [
    "返回值为`topk`个关键词以及相应的权重，权重取值区间为$[0, 1]$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关键词提取并不仅限于短文本，长文章也一样支持："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'新冠病毒核酸阳性感染': 0.888239324092865,\n",
       " '确诊病例': 0.8868124485015869,\n",
       " '本土无症状感染者': 0.8557102680206299,\n",
       " '属地社区（村屯）': 0.8164600133895874,\n",
       " '疫情防控工作': 0.7749382853507996,\n",
       " '我市疫情防控要求': 0.7502512335777283,\n",
       " '症状': 0.669366180896759,\n",
       " '我市疫情形势': 0.6673010587692261,\n",
       " '感染': 0.6663177013397217,\n",
       " '本土确诊病例': 0.6464788317680359}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "doc = '''\n",
    "4月15日0-24时，长春市新增本土确诊病例157例（含57例无症状感染者转为确诊病例），新增本土无症状感染者407例。\n",
    "以上人员均为隔离管控期间筛查新冠病毒核酸阳性感染者。\n",
    "当前我市疫情形势严峻，为做好全市疫情防控工作，尽快恢复正常社会秩序和经济社会发展，长春市新冠肺炎疫情防控工作领导小组办公室提醒广大市民，\n",
    "请严格遵守我市疫情防控要求，配合各部门落实好防控措施，进一步提高防范意识，坚持规范戴口罩、勤洗手、常通风、保持社交距离、不聚餐、不聚集，\n",
    "减少疾病感染风险。一旦出现发热、干咳、乏力、咽痛、嗅味觉减退或丧失等不适症状，应及时向属地社区（村屯）或疾控机构报告。\n",
    "'''\n",
    "HanLP.keyphrase_extraction(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 可视化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "4月15日0-24时，长春市新增本土<span style=\"background-color:rgba(255, 255, 0, 0.8868124485015869);\">确诊病例</span>157例（含57例无<span style=\"background-color:rgba(255, 255, 0, 0.669366180896759);\">症状</span><span style=\"background-color:rgba(255, 255, 0, 0.6663177013397217);\">感染</span>者转为<span style=\"background-color:rgba(255, 255, 0, 0.8868124485015869);\">确诊病例</span>），新增<span style=\"background-color:rgba(255, 255, 0, 0.8557102680206299);\">本土无<span style=\"background-color:rgba(255, 255, 0, 0.669366180896759);\">症状</span><span style=\"background-color:rgba(255, 255, 0, 0.6663177013397217);\">感染</span>者</span>407例。\n",
       "以上人员均为隔离管控期间筛查<span style=\"background-color:rgba(255, 255, 0, 0.888239324092865);\">新冠病毒核酸阳性<span style=\"background-color:rgba(255, 255, 0, 0.6663177013397217);\">感染</span></span>者。\n",
       "当前<span style=\"background-color:rgba(255, 255, 0, 0.6673010587692261);\">我市疫情形势</span>严峻，为做好全市<span style=\"background-color:rgba(255, 255, 0, 0.7749382853507996);\">疫情防控工作</span>，尽快恢复正常社会秩序和经济社会发展，长春市新冠肺炎<span style=\"background-color:rgba(255, 255, 0, 0.7749382853507996);\">疫情防控工作</span>领导小组办公室提醒广大市民，\n",
       "请严格遵守<span style=\"background-color:rgba(255, 255, 0, 0.7502512335777283);\">我市疫情防控要求</span>，配合各部门落实好防控措施，进一步提高防范意识，坚持规范戴口罩、勤洗手、常通风、保持社交距离、不聚餐、不聚集，\n",
       "减少疾病<span style=\"background-color:rgba(255, 255, 0, 0.6663177013397217);\">感染</span>风险。一旦出现发热、干咳、乏力、咽痛、嗅味觉减退或丧失等不适<span style=\"background-color:rgba(255, 255, 0, 0.669366180896759);\">症状</span>，应及时向<span style=\"background-color:rgba(255, 255, 0, 0.8164600133895874);\">属地社区（村屯）</span>或疾控机构报告。\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def highlight(text, scores):\n",
    "    for k, v in scores.items():\n",
    "        text = text.replace(k, f'<span style=\"background-color:rgba(255, 255, 0, {v});\">{k}</span>')\n",
    "    from IPython.display import display, HTML\n",
    "    display(HTML(text))\n",
    "\n",
    "scores = HanLP.keyphrase_extraction(doc)\n",
    "highlight(doc, scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 英文\n",
    "按照HanLP一贯的多语种设计，任何语言都支持。由于服务器GPU资源限制，目前英文接口暂未上线。如果你有相应需求，欢迎前往论坛发起请愿。"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "keyphrase_restful.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
